How can I download an article?

To download an article from SID, first log in to the site, search for the article title, and click on the 'Download Article' option.

How can I download an ISI article?

To download an ISI article on SID, enter the keyword or article title in the search bar, view the relevant results, click on the desired article, and select the 'Download Article' option.

How can I access the SID database?

To access the SID database, visit SID.ir, create an account, and log in to access scientific resources.

Is downloading articles from SID free?

Some articles on SID are available for free, while others require payment. Details are specified on the article's page.

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Journal Article

Download

فارسی Version

Title:

A survey on short text similarity measurement methods

Author(s):

Rabiei Zadeh Ahmad | Amirkhani Hossein

Journal:

SIGNAL AND DATA PROCESSING

Issue Info:

Year:
2023
Volume:
20
Issue:
3
Pages:
103-126

Keywords:

short text similarity Q4

lexical similarity

semantic similarity

natural language processing

sentence embedding

transformer

Abstract:

Measuring similarity between two text snippets is one of the essential tasks in many NLP problems and it has been still one of the most challenging tasks in the field. Various methods have been proposed to measure text similarity. This survey reviews more than 150 of the related papers, introduces a comprehensive taxonomy with three main categories, and discusses the advantages and disadvantages of these methods. The first category is lexical methods that only focus on text pair’s surface similarity. These methods consider the text as a sequence of characters, tokens, or a mixture of these two. Some recent studies use deep learning techniques for detecting lexical similarity in alias detection task. The second category is semantic methods that take into consideration the meaning of the words based on some pre-prepared knowledge-bases like Wordnet or using Corpus-based methods. Some recent studies use modern deep learning techniques like transformers and Siamese networks to create document embedding that outperform other methods. The final category is hybrid methods that take advantage of all other methods even syntactic parsing in some cases. Note that high-quality syntactic parsers are not present for many languages and that using them has some side-effects on performance and speed.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 68 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

Investigating Text Power in Predicting Semantic Similarity

Author(s):

Journal:

INTERNATIONAL JOURNAL OF INFORMATION SCIENCE AND MANAGEMENT

Issue Info:

Year:
2019
Volume:
17
Issue:
1
Pages:
17-31

Keywords:

Distributional Semantics

Semantic Similarity

Textual Similarity

Effectiveness

Information Retrieval

MeSH

Abstract:

This article presents an empirical evaluation to investigate the distributional semantic power of abstract, body and full-text, as different text levels, in predicting the semantic similarity using a collection of open access articles from PubMed. The semantic similarity is measured based on two criteria namely, linear MeSH terms intersection and hierarchical MeSH terms distance. As such, a random sample of 200 queries and 20000 documents are selected from a test collection built on CITREC open source code. Sim Pack Java Library is used to calculate the textual and semantic similarities. The nDCG value corresponding to two of the semantic similarity criteria is calculated at three precision points. Finally, the nDCG values are compared by using the Friedman test to determine the power of each text level in predicting the semantic similarity. The results showed the effectiveness of the text in representing the semantic similarity in such a way that texts with maximum textual similarity are also shown to be 77% and 67% semantically similar in terms of linear and hierarchical criteria, respectively. Furthermore, the text length is found to be more effective in representing the hierarchical semantic compared to the linear one. Based on the findings, it is concluded that when the subjects are homogenous in the tree of knowledge, abstracts provide effective semantic capabilities, while in heterogeneous milieus, full-texts processing or knowledge bases is needed to acquire IR effectiveness.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 108 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Seminar Article

Download

Title:

A Framework For Scalable Similarity Evaluation in Text Graphs

Writer:

Samani Mahdi | GHADIRI NASSER

Conference:

INTERNATIONAL CONFERENCE ON WEB RESEARCH

Issue Info:

Year:
2021
Volume:
7

Keywords:

Graph Database

Semantic Similarity

Apache Spark

Unsupervised Learning

BERT

Selective Weighting

Distributed Algorithm

Abstract:

Graphs and graph databases are applicable over a wide range of domains, including text mining and web mining. Using graphs to represent relationships between entities provides enriched models for emerging tasks of web search and information retrieval. Natural language processing algorithms use graphs to model structural relationships of texts efficiently, resulting in improved performance. However, the need to increase the accuracy of graph construction and weight allocation remains a fundamental challenge. Existing methods for these tasks provide limited efficiency and lack scalability for large graphs. In this study, we propose a novel graph-based method for text modeling and running a query to evaluate the similarity of text segments. In this method, the graph corresponding to the text is first created by modeling words and named entities by the state-of-the-art pre-trained BERT model. Graph nodes are then weighted in two stages. In the first stage, the nodes with more generalization obtain higher weights. The second weighting stage is done by the graph obtained from the query text. In this weighting step, nodes are considered important if they are specifically related to the query text. After determining the important nodes in the graph, the semantic similarity between the query text and the texts in the database is measured. The whole process of this framework uses a natural language processing pipeline in Apache Spark scalable platform. The efficiency of the model was evaluated for both distributed and non-distributed configuration and its scalability on a Spark cluster. Evaluation of the accuracy using the Pearson correlation coefficient shows that the proposed method performs higher performance than its competitors.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0

Journal Article

Download

فارسی Version

Title:

Detection of plagiarism in scientific texts based on text blocking and cosine similarity criteria

Author(s):

Majma Negar | Bashtin Sara

Journal:

SOFT COMPUTING

Issue Info:

Year:
2022
Volume:
11
Issue:
1
Pages:
0-0

Keywords:

Plagiarism

Recognizing the authenticity of scientific texts

Cosine distance

Block text

Text processing

Abstract:

In the last decade, with the expansion of the World Wide Web, the speed and ease of access to ideas, documents, articles, manuscripts, and data collected by others has increased. This has made the exchange of information and ideas between researchers and producers of science easier, but on the other hand, it has made it easier to apply unauthorized copies, write summaries without mentioning the source, and steal literary texts in general. Since universities and educational centers make scientific and research resources easily available to most users, recognizing the authenticity of scientific texts in these centers is more important and, of course, more sensitive. In this research, a method is presented to compare the related parts using the blocking of document parts. In the proposed method, after classifying the documents into two categories of main documents and suspicious documents, preprocessing has been done with the aim of eliminating word stops and new wording. Then the documents are segmented and using cosine similarity, the degree of similarity of the texts with each other is determined. The proposed method in the test of 50 documents in the data set has an accuracy of 94%, which is an improvement of 2% compared to one of the similar methods.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

A FUZZY APPROACH FOR AMBIGUITY REDUCTION IN TEXT SIMILARITY ESTIMATION (CASE STUDY: PERSIAN WEB CONTENTS)

Author(s):

AHANGARBAHAN HAMID | MONTAZER GOLAM ALI

Journal:

JOURNAL OF INFORMATION SYSTEMS AND TELECOMMUNICATION

Issue Info:

Year:
2015
Volume:
3
Issue:
4
Pages:
216-223

Keywords:

STRUCTURAL SIMILARITY

PERSIAN TEXT

Abstract:

Finding similar web contents have great efficiency in academic community and software systems. There are many methods and metrics in literature to measure the extent of text similarity among various documents and some its application especially in plagiarism detection systems. However, most of them do not take ambiguity inherent in word or text pair’s comparison that gained form linguistic experts as well as structural features into account. As a result, pervious methods did not have enough accuracy to deal vague information. So using structural features and considering ambiguity inherent word improve the identification of similar contents. In this paper, a new method has been proposed that taking lexical and structural features in text similarity measures into consideration. After preprocessing and removing stop words, each text was divided into general words and domain-specific knowledge words. For each part, appropriate features and measures are extracted. Then, the two lexical and structural fuzzy inference systems were designed to assess lexical and structural text similarity respectively. The proposed method has been evaluated on Persian paper abstracts of International Conference on e-Learning and e-Teaching (ICELET) Corpus. The results shows that the proposed method can achieve a rate of 75% in terms of precision and can detect 81% of the similar cases.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 157 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

Automatic Keyword Extraction from Persian short Text Using word2vec

Author(s):

Hajipoor O. | SADIDPOUR S.S.

Journal:

JOURNAL OF ELECTRONIC AND CYBER DEFENCE

Issue Info:

Year:
2020
Volume:
8
Issue:
2 (30)
Pages:
105-114

Keywords:

Keyword Extraction Q1

Abstract:

With the growing number of Persian electronic documents and texts, the use of quick and inexpensive methods to access desired texts from the extensive collection of these documents becomes more important. One of the effective techniques to achieve this goal is the extraction of the keywords which represent the main concept of the text. For this purpose, the frequency of a word in the text can not be a proper indication of its significance and its crucial role. Also, most of the keyword extraction methods ignore the concept and semantic of the text. On the other hand, the unstructured nature of new texts in news and electronic documents makes it difficult to extract these words. In this paper, an automated, unsupervised method for keywords extraction in the Persian language that does not have a proper structure is proposed. This method not only takes into account the probability of occurrence of a word and its frequency in the text, but it also understands the concept and semantic of the text by learning word2vec model on the text. In the proposed method, which is a combination of statistical and machine learning methods, after learning word2vec on the text, the words that have the smallest distance with other words are extracted. Then, a statistical equation is proposed to calculate the score of each extracted word using co-occurence and frequency. Finally, words which have the highest scores are selected as the keywords. The evaluations indicate that the efficiency of the method by the F-measure is 53. 92% which is 11% superior to other methods.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

PHONOLOGICAL SIMILARITY EFFECT ON SPAN OF VERBAL SHORT TERM MEMORY IN PERSIAN

Author(s):

JAHAN ALI | BARATZADEH S. | NILIPOUR R.

Journal:

MEDICAL JOURNAL OF TABRIZ UNIVERSITY OF MEDICAL SCIENCES

Issue Info:

Year:
2008
Volume:
30
Issue:
3
Pages:
37-40

Keywords:

VERBAL SHORT TERM MEMORY Q3

MEMORY SPAN Q3

PHONOLOGICAL SIMILARITY EFFECT

SONORITY OF VOWEL

Abstract:

Background and Objectives: Many studies have been done to understand the nature and mechanisms of verbal short term memory. These studies have led to linguistic and nonlinguistic approaches to it. Phonological similarity effect as an important finding of these studies increased the conflict between both approaches. Regarding differences between languages, cross- language investigations may be helpful. The aim of this study is to investigate the effect of phonological similarity on span of verbal short term memory in Persian language.Material and Methods: In this descriptive analytic study, 16 graduate and postgraduate students (mean age 20 years, SD=2.03) participated (4 males, remaining females). All participants were native Persian (monolingual) without any speech or hearing disorders. Stimuli were 450 words categorized in 3 different lists, namely rhyming words list, alliterative words list and dissimilar words list. Each list consisted of twenty five 6-words sequences (150 words in each list). Stimuli were presented via a speaker. There was a 1 second interval between words in each sequence. Three seconds after presenting each sequence a signal was heard as a sign to start the recall.Results: A one-way ANOVA test showed significant difference between rhyming, alliterative and dissimilar words (p= 0.0000). Poshtoc Tukey test showed significant difference between rhyming list and dissimilar list (0.000). Also a significant difference was shown between alliterative and dissimilar list (0.006). There was no difference between rhyming and alliterative lists.Conclusion: These data suggests that in rhyming and alliterative words, vowel, because of higher sonority (rather than other phonemes) enhances the memory span as a cueing feature.Cross-language differences, especially in phonemes sonority level may cause different phonological similarity effects among languages. Since verbal short term memory is sensitive to vowel in words, it seems that the verbal short term memory has a linguistic nature.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

ADVERTISING KEYWORDS RECOMMENDATION FOR SHORT-TEXT WEB PAGES USING WIKIPEDIA

Author(s):

ZHANG W.

Journal:

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY (TIST)

Issue Info:

Year:
2012
Volume:
3
Issue:
2
Pages:
1-25

Keywords:

Abstract:

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

Convolutional deep belief network based short text classification on Arabic corpus

Author(s):

Journal:

COMPUTER SYSTEMS SCIENCE AND ENGINEERING

Issue Info:

Year:
2023
Volume:
45
Issue:
3
Pages:
3097-3113

Keywords:

Abstract:

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Seminar Article

Download

Title:

EVALUATION OF THE EFFECTIVENESS OF TEXT CLASSIFICATION ALGORITHMS ON PERSIAN SHORT TEXTS

Writer:

Annamoradnejad Issa | Habibi jafar

Conference:

INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, COMPUTER AND TELECOMMUNICATIONS

Issue Info:

Year:
2017
Volume:
3

Keywords:

TEXT CLASSIFICATION

SHORT TEXT

NEWS GENRES

TEXT MINING

Abstract:

MOST OF THE CLASSIFICATION ALGORITHMS HAVE BEEN DEVISED TO CLASSIFY LONG TEXTS, SUCH AS EMAIL AND WEB PAGES WHICH OVERSHADOWED THEIR EFFECTIVENESS ON SHORT AND SOMETIMES INFORMAL TEXTS. IN THIS PAPER, WE EVALUATED THE ACCURACY OF FOUR MAJOR CLASSIFICATION ALGORITHMS ON PERSIAN SHORT TEXTS. THESE ALGORITHMS ARE NAÏVE BAYES, K-NEAREST NEIGHBORS, DECISION TREES AND SUPPORT VECTOR MACHINE. FIRST, WE BRIEFLY INTRODUCE THEIR OVERALL METHOD AND PROVIDE SOME BASIC INFORMATION, AND THEN, WE APPLY THESE ALGORITHMS TO ONE SPECIFIC DATASET TO MEASURE THEIR EFFECTIVENESS. RESULTS SHOW THAT THE NAÏVE BAYES ALGORITHM FUNCTION COMPARATIVELY BETTER THAN THE OTHERS, WHILE KNN ALGORITHM HAS THE LEAST ACCURACY.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0

ابتدا 1 2 3 4 5 6 7 8 9 10 انتها ›

بعدی

Scientific Information Database

ISSN: 2588-4824

Search Result

Relevance

Newest

Most Viewed

Most Downloaded

Most Cited